Latest Past Events

Ground Truth Bias in External Cluster Validity Indices

ENG 106, 245 Church Street, Toronto, ON

June 28, 2016 at 2:00 p.m. IEEE CIS Distinguished Lecturer James C. Bezdek will be presenting “Ground Truth Bias in External Cluster Validity Indices”. Speaker: James C. Bezdek IEEE CIS Distinguished Lecturer Day & Time: Tuesday, June 28, 2016 2:00 p.m. – 4:00 p.m. Location: Room ENG 106, George Vary Engineering & Computing Centre 245 Church St., Toronto, ON, M5B 2K3 (Intersection of Church and Gould) Map: http://www.ryerson.ca/maps/ Contact: Dr. Maryam Davoudpour, Dr. Glaucio Carvalho, Dr. Alireza Sadeghian Organizers: Signals & Computational Intelligence Chapter, Magnetics Chapter, Instrumentation & Measurement/Robotics & Automation Chapter Abstract: This talk begins with a short review of clustering that emphasizes external cluster validity indices (CVIs). A method for generalizing external pairbased CVIS (e.g., the crisp Rand and Jacard indices) to evaluate soft partitions is described and illustrated. Three types of validation experiments conducted with synthetic and real world labeled data are discussed: “best c” (internal validation with labeled data), and “best I/E” (agreement between an internal and external CVI pair). As is always the case in cluster validity, conclusions based on empirical evidence are at the mercy of the data, so the reported results might be invalid for different data sets and/or clustering models and algorithms. But much more importantly, we discovered during these tests that some external cluster validity indices are also at the mercy of the distribution of the ground truth itself. We believe that our study of this surprising fact is the first systematic analysis of a largely unknown but very important problem ~ bias due to the distribution of the ground truth partition. Specifically, in addition to the well known bias in many external CVIs caused by monotonic dependency on c, the number of clusters in candidate partitions, there are two additional kinds of bias that can be caused by an unusual distribution of the clusters in the ground truth partition provided with labeled data. The most important ground truth bias is caused by imbalance (unequally sized labeled subsets). We demonstrate these effects with randomized experiments on 25 pair-based external CVIs. Then we provide a theoretical analysis of bias due to ground truth for several CVis by relating Rand’s index to the Havrda-Charvat quadratic entropy. Biography: Jim received the PhD in Applied Mathematics from Cornell University in 1973. Jim is past president of NAFIPS (North American Fuzzy Information Processing Society), IFSA (International Fuzzy Systems Association) and the IEEE CIS (Computational Intelligence Society): founding editor the Int’l. Jo. Approximate Reasoning and the IEEE Transactions on Fuzzy Systems: Life fellow of the IEEE and IFSA; and a recipient of the IEEE 3rd Millennium, IEEE CIS Fuzzy Systems Pioneer, and IEEE technical field award Rosenblatt medals. Jim’s interests: woodworking, optimization, motorcycles, pattern recognition, cigars, clustering in very large data, fishing, co-clustering, blues music, wireless sensor networks, poker and visual clustering. And of course, clustering in big data. Jim retired in 2007, and will be coming to a university near you soon.

Semi-automated Genome Annotation and an Expanded Epigenetic Alphabet

Room LG04, George Vari Engineering and Computing Centre, Ryerson University, Toronto

Thursday February 11th, 2016 at 1:00 p.m. Michael Hoffman, Principal Investigator at Princess Margaret Cancer Centre and Assistant Professor in the Departments of Medical Biophysics, University of Toronto, will be presenting “Semi-automated genome annotation and an expanded epigenetic alphabet”. Speaker: Michael Hoffman Principal Investigator at Princess Margaret Cancer Centre Assistant Professor in the Departments of Medical Biophysics, University of Toronto Day & Time: Thursday, February 11, 2016 1:00 p.m. – 2:00 p.m. Location: Room LG04, George Vari Engineering and Computing Centre Ryerson University, Toronto, M5B 1Z4 Please check before the seminar Contact: llivi@scs.ryerson.ca Abstract: First, we will discuss Segway, an integrative method to identify patterns from multiple functional genomics experiments, discovering joint patterns across different assay types. We apply Segway to ENCODE ChIP-seq andDNase-seq data and identify patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. Segway yields a model which elucidates the relationship between assay observations and functional elements in the genome. Second, we will discuss a new method to discover transcription factor motifs and identify transcription factor binding sites in DNA with covalent modifications such as methylation. Just as transcription factors distinguish one standard nucleobase from another, they also distinguish unmodified and modified bases. To represent the modified bases in a sequence, we replace cytosine (C) with symbols for 5-methylcytosine (5mC), 5-hydroxylmethylcytosine (5hmC), 5-formylcytosine (5fC). Similarly, we adapted the well-established position weight matrix model of transcription factor binding affinity to an expanded alphabet. We created an expanded-alphabet genome sequence using genome-wide maps of 5mC, 5hmC, and 5fC in mouse embryonic stem cells. Using this sequence and expanded-alphabet position weight matrixes, we reproduced various known methylation binding preferences, including the preference of ZFP57 and C/EBPβ for methylated motifs and the preference of c-Myc for unmethylated motifs. Using these known binding preferences to tune model parameters enables discovery of novel modified motifs. Biography: Michael Hoffman is a principal investigator at the Princess Margaret Cancer Centre and Assistant Professor in the Departments of Medical Biophysics and Computer Science, University of Toronto. He researches the application of machine learning techniques to epigenomic data. He previously led the National Institutes of Health ENCODE Project’s large-scale integration task group while at the University of Washington. He has a PhD from the University of Cambridge, where he conducted computational genomics studies at the European Bioinformatics Institute. He also has a B.S. in Biochemistry and a B.A. in the Plan II Honors Program at The University of Texas at Austin. He was named a Genome Technology Young Investigator and has received several awards for his academic work, including a NIH K99/R00 Pathway to Independence Award.

Every Picture Tells a Story: Visual Cluster Assessment in Square and Rectangular Relational Data

Room 1180, Bahen Center for Information Technology, University of Toronto

Monday December 7, 2015 at 4:00 p.m. Professor Emeritus James Bezdek will be presenting “Every Picture Tells a Story: Visual Cluster Assessment in Square and Rectangular Relational Data”. Speaker: Emeritus James Bezdek Past President of NAFIPS, IFSA and the IEEE CIS Day & Time: Monday, December 7, 2015 4:00 p.m. – 6:00 p.m. Location: Room 1180 Bahen Center for Information Technology 40 St. George Street, Toronto Organizer: IEEE Toronto Signals & Computational Intelligence Chapter Distinguished Lecturer Program Contact: Lorenzo Livi, Email:llivi@scs.ryerson.ca Abstract: The VAT/iVAT, algorithms are the parents of a large family of visual assessment models. Part 1. Definitions of the three canonical problems of cluster analysis: tendency assessment, clustering, and cluster validity. History of Visual Clustering. Applications: role-based compliance assessment, eldercare time series data, and anomaly detection in wireless sensor networks. Part 2. Extension to siVAT, scalable iVAT for big data. This is the basis of clusiVAT and clusiVAT+ for clustering in big data (Topic 4 below). Application: image segmentation. Extension to coiVAT for assessment of co-clustering tendency in the four clustering problems associated with rectangular relational data. Application: response of 18 Fetal Bovine Serum Treatments to the treatment of fibroblasts in gene expression data. Biography: Jim received the PhD in Applied Mathematics from Cornell University in 1973. Jim is past president of NAFIPS (North American Fuzzy Information Processing Society), IFSA (International Fuzzy Systems Association) and the IEEE CIS (Computational Intelligence Society): founding editor the Int’l. Jo. Approximate Reasoning and the IEEE Transactions on Fuzzy Systems: Life fellow of the IEEE and IFSA; and a recipient of the IEEE 3rd Millennium, IEEE CIS Fuzzy Systems Pioneer, and IEEE technical field award Rosenblatt medals. Jim’s interests: woodworking, optimization, motorcycles, pattern recognition, cigars, clustering in very large data, fishing, co-clustering, blues music, wireless sensor networks, poker and visual clustering. And of course, clustering in big data. Jim retired in 2007, and will be coming to a university near you soon.