-
On the Right Track! Analysing and Predicting Navigation Success in Wikipedia. Koopmann, Tobias; Dallmann, Alexander; Hettinger, Lena; Niebler, Thomas; Hotho, Andreas in HT '19 (2019). 143--152.
-
Detection of Scenes in Fiction. Gius, Evelyn; Jannidis, Fotis; Krug, Markus; Zehe, Albin; Hotho, Andreas; Puppe, Frank; Krebs, Jonathan; Reiter, Nils; Wiedmer, Nathalie; Konle, Leonard (2019).
-
Flow-based network traffic generation using Generative Adversarial Networks. Ring, Markus; Schlör, Daniel; Landes, Dieter; Hotho, Andreas in Computers & Security (2019). 82 156 - 172.
Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.
-
Team Xenophilius Lovegood at SemEval-2019 Task 4: Hyperpartisanship Classification using Convolutional Neural Networks. Zehe, Albin; Hettinger, Lena; Ernst, Stefan; Hauptmann, Christian; Hotho, Andreas (2019).
-
EClaiRE: Context Matters! – Comparing Word Embeddings for Relation Classification. Hettinger, Lena; Zehe, Albin; Dallmann, Alexander; Hotho, Andreas K. David, Geihs, K., Lange, M., Stumme, G. (eds.) (2019). 191-204.
-
Big-Data Helps SDN to Improve Application Specific Quality of Service. Schwarzmann, Susanna; Blenk, Andreas; Dobrijevic, Ognjen; Jarschel, Michael; Hotho, Andreas; Zinner, Thomas; Wamser, Florian in Big Data and Software Defined Networks (2018).
This chapter first provides an outline of the current results in the domains of: (a) quality-of-service (QoS) / quality-of-experience (QoE) control and management (CaM) for real-time multimedia services that is supported by software-defined networking (SDN), and (b) big data analytics and methods that are used for QoS/QoE CaM. Then, three specific use case scenarios with respect to video streaming services are presented, so as to illustrate the expected benefits of incorporating big data analytics into SDN-based CaM for the purposes of improving or optimizing QoS/QoE. In the end, we describe our vision and a high-level view of an SDN-based architecture for QoS/QoE CaM that is enriched with big data analytics' functional blocks and summarize corresponding challenges.
-
Burrows Zeta: Varianten und Evaluation. Schöch, Christof; Calvo, José; Zehe, Albin; Hotho, Andreas (2018).
-
Analysing Direct Speech in German Novels. Jannidis, Fotis; Konle, Leonard; Zehe, Albin; Hotho, Andreas; Krug, Markus (2018).
-
ClaiRE at SemEval-2018 Task 7: Classification of Relations using Embeddings. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas (2018).
-
ClaiRE at SemEval-2018 Task 7 - Extended Version. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas (2018).
In this paper we describe our post-evaluation results for SemEval-2018 Task 7 on classification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). Due to space limitations we publish an extended version of Hettinger et al. (2018) including further technical details and changes made to the preprocessing step in the post-evaluation phase. Due to these changes Classification of Relations using Embeddings (ClaiRE) achieved an improved F1 score of 75.11% for the first subtask and 81.44% for the second.
-
Healing Time Correlates With the Quality of Scaring: Results From a Prospective Randomized Control Donor Site Trial. Werdin, Frank; Tenenhaus, Mayer; Becker, Martin; Rennekampff, Hans-Oliver in Dermatologic Surgery (2018). 44(4) 521--527.
-
A White-Box Model for Detecting Author Nationality by Linguistic Differences in Spanish Novels. Zehe, Albin; Schlör, Daniel; Henny-Krahmer, Ulrike; Becker, Martin; Hotho, Andreas (2018).
-
Accessing Information with Tags: Search and Ranking. Navarro Bullock, Beate; Hotho, Andreas; Stumme, Gerd in Social Information Access: Systems and Technologies, P. Brusilovsky, He, D. (eds.) (2018). 310--343.
With the growth of the Social Web, a variety of new web-based services arose and changed the way users interact with the internet and consume information. One central phenomenon was and is tagging which allows to manage, organize and access information in social systems. Tagging helps to manage all kinds of resources, making their access much easier. The first type of social tagging systems were social bookmarking systems, i.e., platforms for storing and sharing bookmarks on the web rather than just in the browser. Meanwhile, (hash-)tagging is central in many other Social Media systems such as social networking sites and micro-blogging platforms. To allow for efficient information access, special algorithms have been developed to guide the user, to search for information and to rank the content based on tagging information contributed by the users.
-
Burrows’ Zeta: Exploring and Evaluating Variants and Parameters. Schöch, Christof; Schlör, Daniel; Zehe, Albin; Gebhard, Henning; Becker, Martin; Hotho, Andreas (2018). 274-277.
-
EveryAware Gears: A Tool to visualize and analyze all types of Citizen Science Data. Lautenschlager, Florian; Becker, Martin; Steininger, Michael; Hotho, Andreas D. Burghardt, Chen, S., Andrienko, G., Andrienko, N., Purves, R., Diehl, A. (eds.) (2018).
-
Air Trails -- Urban Air Quality Campaign Exploration Patterns. Becker, Martin; Lautenschlager, Florian; Hotho, Andreas (2018).
-
pysubgroup: Easy-to-Use Subgroup Discovery in Python. Lemmerich, Florian; Becker, Martin in Lecture Notes in Computer Science, U. Brefeld, Curry, E., Daly, E., MacNamee, B., Marascu, A., Pinelli, F., Berlingerio, M., Hurley, N. (eds.) (2018). (Vol. 11053) 658-662.
-
Flow-based Network Traffic Generation using Generative Adversarial Networks. Ring, Markus; Schlör, Daniel; Landes, Dieter; Hotho, Andreas in CoRR (2018). abs/1810.07795
Flow-based data sets are necessary for evaluating network-based intrusion de- tection systems (NIDS). In this work, we propose a novel methodology for gener- ating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous at- tributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the gener- ated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.
-
Adaptive kNN Using Expected Accuracy for Classification of Geo-spatial Data. Kibanov, Mark; Becker, Martin; Müller, Juergen; Atzmüller, Martin; Hotho, Andreas; Stumme, Gerd in SAC '18 (2018). 857--865.
The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e. g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy.
-
Detection of slow port scans in flow-based network traffic. Ring, Markus; Landes, Dieter; Hotho, Andreas in PLOS ONE (2018). 13(9) 1-18.
Frequently, port scans are early indicators of more serious attacks. Unfortunately, the detection of slow port scans in company networks is challenging due to the massive amount of network data. This paper proposes an innovative approach for preprocessing flow-based data which is specifically tailored to the detection of slow port scans. The preprocessing chain generates new objects based on flow-based data aggregated over time windows while taking domain knowledge as well as additional knowledge about the network structure into account. The computed objects are used as input for the further analysis. Based on these objects, we propose two different approaches for detection of slow port scans. One approach is unsupervised and uses sequential hypothesis testing whereas the other approach is supervised and uses classification algorithms. We compare both approaches with existing port scan detection algorithms on the flow-based CIDDS-001 data set. Experiments indicate that the proposed approaches achieve better detection rates and exhibit less false alarms than similar algorithms.
-
Mining social semantics on the social web. Hotho, A.; Jaeschke, R.; Lerman, K. in Semantic Web (2017). 8(5) 623--624.
-
A Bayesian Method for Comparing Hypotheses About Human Trails. Singer, Philipp; Helic, Denis; Hotho, Andreas; Strohmaier, Markus in ACM Trans. Web (2017). 11(3) 14:1--14:29.
When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful, for example, for improving underlying network structures, predicting user clicks, or enhancing recommendations. In this work, we present a method called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our method utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to calculate the evidence of the data under them. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method, and to compare the relative plausibility of hypotheses, we employ Bayes factors. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including Web site navigation, business reviews, and online music played. Our work expands the repertoire of methods available for studying human trails.
-
Flow-based benchmark data sets for intrusion detection. Ring, Markus; Wunderlich, Sarah; Grüdl, Dominik; Landes, Dieter; Hotho, Andreas (2017). 361--369.
-
Comparing Hypotheses About Sequential Data: A Bayesian Approach and Its Applications. Lemmerich, Florian; Singer, Philipp; Becker, Martin; Espin-Noboa, Lisette; Dimitrov, Dimitar; Helic, Denis; Hotho, Andreas; Strohmaier, Markus (2017). 354--357.
-
Creation of Flow-Based Data Sets for Intrusion Detection. Ring, Markus; Wunderlich, Sarah; Grüdl, Dominik; Landes, Dieter; Hotho, Andreas in Journal of Information Warfare (2017). 16(4) 41-54.
Publicly available labelled data sets are necessary for evaluating anomaly-based Intrusion Detection Systems (IDS). However, existing data sets are often not up-to-date or not yet published because of privacy concerns. This paper identifies requirements for good data sets and proposes an approach for their generation. The key idea is to use a test environment and emulate realistic user behaviour with parameterised scripts on the clients. Comprehensive logging mechanisms provide additional information which may be used for a better understanding of the inner dynamics of an IDS. Finally, the proposed approach is used to generate the flow-based CIDDS-002 data set.
-
Improving Session Recommendation with Recurrent Neural Networks by Exploiting Dwell Time. Dallmann, Alexander; Grimm, Alexander; Pölitz, Christian; Zoller, Daniel; Hotho, Andreas in CoRR (2017). abs/1706.10231
-
Sedentary Behavior among National Elite Rowers during Off-Training—A Pilot Study. Sperlich, Billy; Becker, Martin; Hotho, Andreas; Wallmann-Sperlich, Birgit; Sareban, Mahdi; Winkert, Kay; Steinacker, Jürgen M.; Treff, Gunnar in Frontiers in Physiology (2017). 8 655.
The aim of this pilot study was to analyze the off-training physical activity (PA) profile in national elite German U23 rowers during 31 days of their preparation period. The hours spent in each PA category (i.e. sedentary: <1.5 MET; light physical activity: 1.5–3 MET; moderate physical activity: 3–6 MET and vigorous intense physical activity: >6 MET) were calculated for every valid day (i.e. > 480 min of wear time). The off-training PA during 21 weekdays and 10 weekend days of the final 11-wk preparation period was assessed by a wrist-worn multisensory device (Microsoft Band II (MSBII)). A total of 11 rowers provided valid data (i.e. > 480 min/day) for 11.6 week days and 4.8 weekend days during the 31 days observation period. The average sedentary time was 11.63±1.25 hours per day during the week and 12.49±1.10 hours per day on the weekend, with a tendency to be higher on the weekend compared to weekdays (p = 0.06; d = 0.73). The average time in light, moderate and vigorous PA during the weekdays was 1.27±1.15, 0.76±0.37, 0.51±0.44 hours per day and 0.67±0.43, 0.59±0.37, 0.53±0.32 hours per weekend day. Light physical activity was higher during weekdays compared to the weekend (p = 0.04; d = 0.69) Based on our pilot study of eleven national elite rowers we conclude that rowers display a considerable sedentary off-training behavior of more than 11.5 hours/day.
-
A Toolset for Intrusion and Insider Threat Detection. Ring, Markus; Wunderlich, Sarah; Grüdl, Dominik; Landes, Dieter; Hotho, Andreas in Data Analytics and Decision Support for Cybersecurity: Trends, Methodologies and Applications, I. Palomares Carrascosa, Kalutarage, H. K., Huang, Y. (eds.) (2017). 3--31.
Company data are a valuable asset and must be protected against unauthorized access and manipulation. In this contribution, we report on our ongoing work that aims to support IT security experts with identifying novel or obfuscated attacks in company networks, irrespective of their origin inside or outside the company network. A new toolset for anomaly based network intrusion detection is proposed. This toolset uses flow-based data which can be easily retrieved by central network components. We study the challenges of analysing flow-based data streams using data mining algorithms and build an appropriate approach step by step. In contrast to previous work, we collect flow-based data for each host over a certain time window, include the knowledge of domain experts and analyse the data from three different views. We argue that incorporating expert knowledge and previous flows allow us to create more meaningful attributes for subsequent analysis methods. This way, we try to detect novel attacks while simultaneously limiting the number of false positives.
-
Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning. Niebler, Thomas; Becker, Martin; Pölitz, Christian; Hotho, Andreas (2017).
-
Towards Sentiment Analysis on German Literature. Zehe, Albin; Becker, Martin; Jannidis, Fotis; Hotho, Andreas (2017).
-
Eleven-Week Preparation Involving Polarized Intensity Distribution Is Not Superior to Pyramidal Distribution in National Elite Rowers. Treff, Gunnar; Winkert, Kay; Sareban, Mahdi; Steinacker, Jürgen M.; Becker, Martin; Sperlich, Billy in Frontiers in Physiology (2017). 8 515.
Polarized (POL) training intensity distribution (TID) emphasizes high-volume low-intensity exercise in zone (Z)1 (< first lactate threshold) with a greater proportion of high-intensity Z3 (> second lactate threshold) compared to Z2 (between first and second lactate threshold). In highly trained rowers there is a lack of prospective controlled evidence whether POL is superior to pyramidal (PYR; i.e. greater volume in Z1 vs. Z2 vs. Z3) TID. The aim of the study was to compare the effect of POL vs. PYR TID in rowers during an 11-wk preparation period. Fourteen national elite male rowers participated (age: 20 ± 2 years, maximal oxygen uptake (⩒O2max): 66±5 mL/min/kg). The sample was split into PYR and POL by varying the percentage spent in Z2 and Z3 while Z1 was clamped to ~93% and matched for total and rowing volume. Actual TIDs were based on time within heart rate zones (Z1 and Z2) and duration of Z3-intervals. The main outcome variables were average power in 2000 m ergometer-test (P2000m), power associated with 4 mmol/L [blood lactate] (P4[BLa]), and ⩒O2max. To quantify the level of polarization, we calculated a Polarization-Index as log (%Z1 x %Z3/%Z2). PYR and POL did not significantly differ regarding rowing or total volume, but POL had a higher percentage of Z3 intensities (6±3% vs. 2±1%; p < .005) while Z2 was lower (1±1% vs. 3±2%; p < .05) and Z1 was similar (94±3% vs. 93±2%, p = .37). Consequently, Polarization-Index was significantly higher in POL (3.0±0.7 a.u. vs. 1.9±0.4 a.u.; p < .01) P2000m did not significantly change with PYR (1.5±1.7%, p = .06) nor POL (1.5±2.6%, p = .26). ⩒O2max did not change (1.7±5.6%, p = .52 or 0.6±2.6, p = .67) and a small increase in P4[BLa] was observed in PYR only (1.9±4.8%, p = .37 or -0.5±4.1%, p = .77). Changes from pre to post were not significantly different between groups in any performance measure. POL did not prove to be superior to PYR, possibly due to the high and very similar percentage of Z1 in this study.
-
Leveraging User-Interactions for Time-Aware Tag Recommendations. Zoller, Daniel; Doerfel, Stephan; Pölitz, Christian; Hotho, Andreas in CEUR Workshop Proceedings (2017).
For the popular task of tag recommendation, various (complex) approaches have been proposed. Recently however, research has focused on heuristics with low computational effort and particularly, a time-aware heuristic, called BLL, has been shown to compare well to various state-of-the-art methods. Here, we follow up on these results by presenting another time-aware approach leveraging user interaction data in an easily interpretable, on-the-fly computable approach that can successfully be combined with BLL. We investigate the influence of time as a parameter in that approach, and we demonstrate the effectiveness of the proposed method using two datasets from the popular public social tagging system BibSonomy.
-
Participatory sensing, opinions and collective awareness Loreto, Vittorio; Haklay, Mordechai; Hotho, Andreas; Servedio, Vito C. P.; Stumme, Gerd; Theunis, Jan; Tria, Francesca (2017). Springer.
-
Learning Semantic Relatedness From Human Feedback Using Metric Learning. Niebler, Thomas; Becker, Martin; Pölitz, Christian; Hotho, Andreas (2017).
Assessing the degree of semantic relatedness between words is an important task with a variety of semantic applications, such as ontology learning for the Semantic Web, semantic search or query expansion. To accomplish this in an automated fashion, many relatedness measures have been proposed. However, most of these metrics only encode information contained in the underlying corpus and thus do not directly model human intuition. To solve this, we propose to utilize a metric learning approach to improve existing semantic relatedness measures by learning from additional information, such as explicit human feedback. For this, we argue to use word embeddings instead of traditional high-dimensional vector representations in order to leverage their semantic density and to reduce computational cost. We rigorously test our approach on several domains including tagging data as well as publicly available embeddings based on Wikipedia texts and navigation. Human feedback about semantic relatedness for learning and evaluation is extracted from publicly available datasets such as MEN or WS-353. We find that our method can significantly improve semantic relatedness measures by learning from additional information, such as explicit human feedback. For tagging data, we are the first to generate and study embeddings. Our results are of special interest for ontology and recommendation engineers, but also for any other researchers and practitioners of Semantic Web techniques.
-
Experimental Assessment of the Emergence of Awareness and Its Influence on Behavioral Changes: The Everyaware Lesson. Gravino, Pietro; S^irbu, Alina; Becker, Martin; Servedio, Vito DP; Loreto, Vittorio in Participatory Sensing, Opinions and Collective Awareness (2017). 337--362.
-
Collective Sensing Platforms. Atzmueller, Martin; Becker, Martin; Mueller, Juergen in Participatory Sensing, Opinions and Collective Awareness (2017). 115--133.
-
Applications for Environmental Sensing in EveryAware. Atzmueller, Martin; Becker, Martin; Molino, Andrea; Mueller, Juergen; Peters, Jan; S^irbu, Alina in Participatory Sensing, Opinions and Collective Awareness (2017). 135--155.
-
MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data. Becker, Martin; Lemmerich, Florian; Singer, Philipp; Strohmaier, Markus; Hotho, Andreas in Data Mining and Knowledge Discovery (2017).
Sequential traces of user data are frequently observed online and offline, e.g., as sequences of visited websites or as sequences of locations captured by GPS. However, understanding factors explaining the production of sequence data is a challenging task, especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of browsing a website or movement behavior may vary between groups of users. In this work, we tackle this task and propose MixedTrails , a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis is derived from existing literature, theory, or intuition and represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions. For example, when trying to understand human movement in a city and given some data, a hypothesis assuming tourists to be more likely to move towards points of interests than locals can be shown to be more plausible than a hypothesis assuming the opposite. Our approach incorporates such hypotheses as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors, give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.
-
Learning Word Embeddings from Tagging Data: A methodological comparison. Niebler, Thomas; Hahn, Luzian; Hotho, Andreas (2017).
-
IP2Vec: Learning Similarities Between IP Addresses. Ring, Markus; Landes, Dieter; Dallmann, Alexander; Hotho, Andreas in 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (2017). 657-666.
-
Mining Subgroups with Exceptional Transition Behavior. Lemmerich, Florian; Becker, Martin; Singer, Philipp; Helic, Denis; Hotho, Andreas; Strohmaier, Markus B. Krishnapuram, Shah, M., Smola, A. J., Aggarwal, C., Shen, D., Rastogi, R. (eds.) (2016). 965-974.
-
FolkTrails: Interpreting Navigation Behavior in a Social Tagging System. Niebler, Thomas; Becker, Martin; Zoller, Daniel; Doerfel, Stephan; Hotho, Andreas in CIKM '16 (2016).
Social tagging systems have established themselves as a quick and easy way to organize information by annotating resources with tags. In recent work, user behavior in social tagging systems was studied, that is, how users assign tags, and consume content. However, it is still unclear how users make use of the navigation options they are given. Understanding their behavior and differences in behavior of different user groups is an important step towards assessing the effectiveness of a navigational concept and of improving it to better suit the users’ needs. In this work, we investigate navigation trails in the popular scholarly social tagging system BibSonomy from six years of log data. We discuss dynamic browsing behavior of the general user population and show that different navigational subgroups exhibit different navigational traits. Furthermore, we provide strong evidence that the semantic nature of the underlying folksonomy is an essential factor for explaining navigation.
-
SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails. Becker, Martin; Mewes, Hauke; Hotho, Andreas; Dimitrov, Dimitar; Lemmerich, Florian; Strohmaier, Markus J. Bourdeau, Hendler, J., Nkambou, R., Horrocks, I., Zhao, B. Y. (eds.) (2016). 17-18.
-
Prediction of Happy Endings in German Novels. Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas; Reger, Isabella; Jannidis, Fotis P. Cellier, Charnois, T., Hotho, A., Matwin, S., Moens, M. -F., Toussaint, Y. (eds.) (2016). 9-16.
Identifying plot structure in novels is a valuable step towards automatic processing of literary corpora. We present an approach to classify novels as either having a happy ending or not. To achieve this, we use features based on different sentiment lexica as input for an SVM- classifier, which yields an average F1-score of about 73%.
-
Extracting Semantics from Unconstrained Navigation on Wikipedia. Niebler, Thomas; Schlör, Daniel; Becker, Martin; Hotho, Andreas in KI (2016). 30(2) 163-168.
-
Comparison of non-invasive individual monitoring of the training and health of athletes with commercially available wearable technologies. Düking, Peter; Hotho, Andreas; Fuss, Franz Konstantin; Holmberg, Hans-Christer; Sperlich, Billy in Frontiers in Physiology (2016). 7(71)
Athletes adapt their training daily to optimize performance, as well as avoid fatigue, overtraining and other undesirable effects on their health. To optimize training load, each athlete must take his/her own personal objective and subjective characteristics into consideration and an increasing number of wearable technologies (wearables) provide convenient monitoring of various parameters. Accordingly, it is important to help athletes decide which parameters are of primary interest and which wearables can monitor these parameters most effectively. Here, we discuss the wearable technologies available for non-invasive monitoring of various parameters concerning an athlete's training and health. On the basis of these considerations, we suggest directions for future development. Furthermore, we propose that a combination of several wearables is most effective for accessing all relevant parameters, disturbing the athlete as little as possible, and optimizing performance and promoting health.
-
Posted, Visited, Exported: Altmetrics in the Social Tagging System BibSonomy. Zoller, Daniel; Doerfel, Stephan; Jäschke, Robert; Stumme, Gerd; Hotho, Andreas in Journal of Informetrics (2016). 10(3) 732--749.
In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users’ activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points.
-
MixedTrails: Bayesian Hypotheses Comparison on Heterogeneous Sequential Data. Becker, Martin; Lemmerich, Florian; Singer, Philipp; Strohmaier, Markus; Hotho, Andreas (2016).
Sequential traces of user data are frequently observed online and offline, e.g.,as sequences of visited websites or as sequences of locations captured by GPS. However,understanding factors explaining the production of sequence data is a challenging task,especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of a website visit, or movement behavior may vary between groups of user. In this work, we tackle this task and propose MixedTrails, a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions.For example, when trying to understand human movement in a city, a hypothesis assuming tourists to be more likely to move towards points of interests than locals, can be shown to be more plausible with observed data than a hypothesis assuming the opposite. Our approach incorporates these beliefs as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors,give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.
-
What Users Actually do in a Social Tagging System: A Study of User Behavior in BibSonomy. Doerfel, Stephan; Zoller, Daniel; Singer, Philipp; Niebler, Thomas; Hotho, Andreas; Strohmaier, Markus in ACM Transactions on the Web (2016). 10(2) 14:1--14:32.
Social tagging systems have established themselves as an important part in today’s web and have attracted the interest of our research community in a variety of investigations. Henceforth, several aspects of social tagging systems have been discussed and assumptions have emerged on which our community builds their work. Yet, testing such assumptions has been difficult due to the absence of suitable usage data in the past. In this work, we thoroughly investigate and evaluate four aspects about tagging systems, covering social interaction, retrieval of posted resources, the importance of the three different types of entities, users, resources, and tags, as well as connections between these entities’ popularity in posted and in requested content. For that purpose, we examine live server log data gathered from the real-world, public social tagging system BibSonomy. Our empirical results paint a mixed picture about the four aspects. While for some, typical assumptions hold to a certain extent, other aspects need to be reflected in a very critical light. Our observations have implications for the understanding of social tagging systems, and the way they are used on the web. We make the dataset used in this work available to other researchers.
-
Analyzing Features for the Detection of Happy Endings in German Novels. Jannidis, Fotis; Reger, Isabella; Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas (2016).
With regard to a computational representation of literary plot, this paper looks at the use of sentiment analysis for happy ending detection in German novels. Its focus lies on the investigation of previously proposed sentiment features in order to gain insight about the relevance of specific features on the one hand and the implications of their performance on the other hand. Therefore, we study various partitionings of novels, considering the highly variable concept of "ending". We also show that our approach, even though still rather simple, can potentially lead to substantial findings relevant to literary studies.
-
Creation of Specific Flow-Based Training Data Sets for Usage Behaviour Classification. Otto, Florian; Ring, Markus; Landes, Dieter; Hotho, Andreas (2016). 437.
-
Significance Testing for the Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas (2016).
-
Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas (2016).
-
Extracting Semantics from Random Walks on Wikipedia: Comparing learning and counting methods. Dallmann, Alexander; Niebler, Thomas; Lemmerich, Florian; Hotho, Andreas R. West, Zia, L., Taraborelli, D., Leskovec, J. (eds.) (2016).
Semantic relatedness between words has been extracted from a variety of sources. In this ongoing work, we explore and compare several options for determining if semantic relatedness can be extracted from navigation structures in Wikipedia. In that direction, we first investigate the potential of representation learning techniques such as DeepWalk in comparison to previously applied methods based on counting co-occurrences. Since both methods are based on (random) paths in the network, we also study different approaches to generate paths from Wikipedia link structure. For this task, we do not only consider the link structure of Wikipedia, but also actual navigation behavior of users. Finally, we analyze if semantics can also be extracted from smaller subsets of the Wikipedia link network. As a result we find that representa- tion learning techniques mostly outperform the investigated co-occurrence counting methods on the Wikipedia network. However, we find that this is not the case for paths sampled from human navigation behavior.
-
Photowalking the city: comparing hypotheses about urban photo trails on Flickr. Becker, Martin; Singer, Philipp; Lemmerich, Florian; Hotho, Andreas; Helic, Denis; Strohmaier, Markus (2015).
-
Participatory Patterns in an International Air Quality Monitoring Initiative. Sîrbu, Alina; Becker, Martin; Caminiti, Saverio; De Baets, Bernard; Elen, Bart; Francis, Louise; Gravino, Pietro; Hotho, Andreas; Ingarra, Stefano; Loreto, Vittorio; Molino, Andrea; Mueller, Juergen; Peters, Jan; Ricchiuti, Ferdinando; Saracino, Fabio; Servedio, Vito D. P.; Stumme, Gerd; Theunis, Jan; Tria, Francesca; Van den Bossche, Joris (2015).
The issue of sustainability is at the top of the political and societal agenda, being considered of extreme importance and urgency. Human individual action impacts the environment both locally (e.g., local air/water quality, noise disturbance) and globally (e.g., climate change, resource use). Urban environments represent a crucial example, with an increasing realization that the most effective way of producing a change is involving the citizens themselves in monitoring campaigns (a citizen science bottom-up approach). This is possible by developing novel technologies and IT infrastructures enabling large citizen participation. Here, in the wider framework of one of the first such projects, we show results from an international competition where citizens were involved in mobile air pollution monitoring using low cost sensing devices, combined with a web-based game to monitor perceived levels of pollution. Measures of shift in perceptions over the course of the campaign are provided, together with insights into participatory patterns emerging from this study. Interesting effects related to inertia and to direct involvement in measurement activities rather than indirect information exposure are also highlighted, indicating that direct involvement can enhance learning and environmental awareness. In the future, this could result in better adoption of policies towards decreasing pollution.
-
Genre classification on German novels. Hettinger, Lena; Becker, Martin; Reger, Isabella; Jannidis, Fotis; Hotho, Andreas (2015).
-
Participatory Patterns in an International Air Quality Monitoring Initiative. Sîrbu, Alina; Becker, Martin; Caminiti, Saverio; De Baets, Bernard; Elen, Bart; Francis, Louise; Gravino, Pietro; Hotho, Andreas; Ingarra, Stefano; Loreto, Vittorio; Molino, Andrea; Mueller, Juergen; Peters, Jan; Ricchiuti, Ferdinando; Saracino, Fabio; Servedio, Vito D. P.; Stumme, Gerd; Theunis, Jan; Tria, Francesca; Van den Bossche, Joris in PLoS ONE (2015). 10(8) e0136763.
<p>The issue of sustainability is at the top of the political and societal agenda, being considered of extreme importance and urgency. Human individual action impacts the environment both locally (e.g., local air/water quality, noise disturbance) and globally (e.g., climate change, resource use). Urban environments represent a crucial example, with an increasing realization that the most effective way of producing a change is involving the citizens themselves in monitoring campaigns (a citizen science bottom-up approach). This is possible by developing novel technologies and IT infrastructures enabling large citizen participation. Here, in the wider framework of one of the first such projects, we show results from an international competition where citizens were involved in mobile air pollution monitoring using low cost sensing devices, combined with a web-based game to monitor perceived levels of pollution. Measures of shift in perceptions over the course of the campaign are provided, together with insights into participatory patterns emerging from this study. Interesting effects related to inertia and to direct involvement in measurement activities rather than indirect information exposure are also highlighted, indicating that direct involvement can enhance learning and environmental awareness. In the future, this could result in better adoption of policies towards decreasing pollution.</p>
-
VizTrails: An Information Visualization Tool for Exploring Geographic Movement Trajectories. Becker, Martin; Singer, Philipp; Lemmerich, Florian; Hotho, Andreas; Helic, Denis; Strohmaier, Markus in HT '15 (2015). 319--320.
Understanding the way people move through urban areas represents an important problem that has implications for a range of societal challenges such as city planning, public transportation, or crime analysis. In this paper, we present an interactive visualization tool called VizTrails for exploring and understanding such human movement. It features visualizations that show aggregated statistics of trails for geographic areas that correspond to grid cells on a map, e.g., on the number of users passing through or on cells commonly visited next. Amongst other features, system allows to overlay the map with the results of SPARQL queries in order to relate the observed trajectory statistics with its geo-spatial context, e.g., considering a city's points of interest. The systems functionality is demonstrated using trajectory examples extracted from the social photo sharing platform Flickr. Overall, VizTrails facilitates deeper insights into geo-spatial trajectory data by enabling interactive exploration of aggregated statistics and providing geo-spatial context.
-
ConDist: A Context-Driven Categorical Distance Measure. Ring, Markus; Otto, Florian; Becker, Martin; Niebler, Thomas; Landes, Dieter; Hotho, Andreas ECMLPKDD2015 (ed.) (2015).
-
Exploratory subgroup analytics on ubiquitous data. Atzmueller, Martin; Mueller, Juergen; Becker, Martin in Mining, Modeling and Recommending 'Things' in Social Media (2015). (Vol. 8940) 1--20.
-
Automatic Threshold Calculation for the Categorical Distance Measure ConDist. Ring, Markus; Landes, Dieter; Hotho, Andreas in CEUR Workshop Proceedings, R. Bergmann, Görg, S., Müller, G. (eds.) (2015). (Vol. 1458) 52-63.
-
Evaluating Emergent Semantics in Folksonomies on Human Intuition Niebler, Thomas; Becker, Martin; Zoller, Daniel; Doerfel, Stephan; Hotho, Andreas (2015).
-
Hyptrails: A bayesian approach for comparing hypotheses about human trails. Singer, P.; Helic, D.; Hotho, A.; Strohmaier, M. (2015).
-
Participatory patterns in an international air quality monitoring initiative. Sirbu, Alina; Becker, Martin; Caminiti, Saverio; De Baets, Bernard; Elen, Bart; Francis, Louise; Gravino, Pietro; Hotho, Andreas; Ingarra, Stefano; Loreto, Vittorio; Molino, Andrea; Mueller, Juergen; Peters, Jan; Ricchiuti, Ferdinando; Saracino, Fabio; Servedio, Vito D. P.; Stumme, Gerd; Theunis, Jan; Tria, Francesca; Van den Bossche, Joris in PLoS ONE (2015). 10(8) e0136763.
The issue of sustainability is at the top of the political and societal agenda, being considered of extreme importance and urgency. Human individual action impacts the environment both locally (e.g., local air/water quality, noise disturbance) and globally (e.g., climate change, resource use). Urban environments represent a crucial example, with an increasing realization that the most effective way of producing a change is involving the citizens themselves in monitoring campaigns (a citizen science bottom-up approach). This is possible by developing novel technologies and IT infrastructures enabling large citizen participation. Here, in the wider framework of one of the first such projects, we show results from an international competition where citizens were involved in mobile air pollution monitoring using low cost sensing devices, combined with a web-based game to monitor perceived levels of pollution. Measures of shift in perceptions over the course of the campaign are provided, together with insights into participatory patterns emerging from this study. Interesting effects related to inertia and to direct involvement in measurement activities rather than indirect information exposure are also highlighted, indicating that direct involvement can enhance learning and environmental awareness. In the future, this could result in better adoption of policies towards decreasing pollution.
-
Media Bias in German Online Newspapers. Dallmann, Alexander; Lemmerich, Florian; Zoller, Daniel; Hotho, Andreas (2015).
-
Modeling and Extracting Load Intensity Profiles. v. Kistowski, Jóakim; Nikolas, Herbst.; Zoller, Daniel; Kounev, Samuel; Hotho, Andreas (2015).
Today’s system developers and operators face the challenge of creating software systems that make efficient use of dynamically allocated resources under highly variable and dynamic load profiles, while at the same time delivering reliable performance. Benchmarking of systems under these constraints is difficult, as state-of-the-art benchmarking frameworks provide only limited support for emulating such dynamic and highly vari- able load profiles for the creation of realistic workload scenarios. Industrial benchmarks typically confine themselves to workloads with constant or stepwise increasing loads. Alternatively, they support replaying of recorded load traces. Statistical load inten- sity descriptions also do not sufficiently capture concrete pattern load profile variations over time. To address these issues, we present the Descartes Load Intensity Model (DLIM). DLIM provides a modeling formalism for describing load intensity variations over time. A DLIM instance can be used as a compact representation of a recorded load intensity trace, providing a powerful tool for benchmarking and performance analysis. As manually obtaining DLIM instances can be time consuming, we present three different automated extraction methods, which also help to enable autonomous system analysis for self-adaptive systems. Model expressiveness is validated using the presented extraction methods. Extracted DLIM instances exhibit a median modeling error of 12.4% on average over nine different real-world traces covering between two weeks and seven months. Additionally, extraction methods perform orders of magnitude faster than existing time series decomposition approaches.
-
On Publication Usage in a Social Bookmarking System. Zoller, Daniel; Doerfel, Stephan; Jäschke, Robert; Stumme, Gerd; Hotho, Andreas (2015).
Scholarly success is traditionally measured in terms of citations to publications. With the advent of publication man- agement and digital libraries on the web, scholarly usage data has become a target of investigation and new impact metrics computed on such usage data have been proposed – so called altmetrics. In scholarly social bookmarking sys- tems, scientists collect and manage publication meta data and thus reveal their interest in these publications. In this work, we investigate connections between usage metrics and citations, and find posts, exports, and page views of publications to be correlated to citations.
-
MicroTrails: Comparing Hypotheses About Task Selection on a Crowdsourcing Platform. Becker, Martin; Borchert, Kathrin; Hirth, Matthias; Mewes, Hauke; Hotho, Andreas; Tran-Gia, Phuoc in i-KNOW '15 (2015). 10:1--10:8.
-
Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, DMNLP@PKDD/ECML 2014, Nancy, France, September 15, 2014 Cellier, Peggy; Charnois, Thierry; Hotho, Andreas; Matwin, Stan; Moens, Marie-Francine; Toussaint, Yannick in CEUR Workshop Proceedings (2014). (Vol. 1202) CEUR-WS.org.
-
HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web. Singer, Philipp; Helic, Denis; Hotho, Andreas; Strohmaier, Markus (2014).
When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.
-
Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014) co-located with the 8th ACM Conference on Recommender Systems (RecSys 2014), Foster City, CA, USA, October 6, 2014 Jannach, Dietmar; Freyne, Jill; Geyer, Werner; Guy, Ido; Hotho, Andreas; Mobasher, Bamshad in CEUR Workshop Proceedings (2014). (Vol. 1271) CEUR-WS.org.
-
Of course we share! Testing Assumptions about Social Tagging Systems. Doerfel, Stephan; Zoller, Daniel; Singer, Philipp; Niebler, Thomas; Hotho, Andreas; Strohmaier, Markus (2014).
Social tagging systems have established themselves as an important part in today's web and have attracted the interest from our research community in a variety of investigations. The overall vision of our community is that simply through interactions with the system, i.e., through tagging and sharing of resources, users would contribute to building useful semantic structures as well as resource indexes using uncontrolled vocabulary not only due to the easy-to-use mechanics. Henceforth, a variety of assumptions about social tagging systems have emerged, yet testing them has been difficult due to the absence of suitable data. In this work we thoroughly investigate three available assumptions - e.g., is a tagging system really social? - by examining live log data gathered from the real-world public social tagging system BibSonomy. Our empirical results indicate that while some of these assumptions hold to a certain extent, other assumptions need to be reflected and viewed in a very critical light. Our observations have implications for the design of future search and other algorithms to better reflect the actual user behavior.
-
The sixth ACM RecSys workshop on recommender systems and the social web. Jannach, Dietmar; Freyne, Jill; Geyer, Werner; Guy, Ido; Hotho, Andreas; Mobasher, Bamshad (2014). 395.
-
How Social is Social Tagging? Doerfel, Stephan; Zoller, Daniel; Singer, Philipp; Niebler, Thomas; Hotho, Andreas; Strohmaier, Markus in WWW 2014 (2014).
-
Ubicon and its Applications for Ubiquitous Social Computing. Atzmueller, Martin; Becker, Martin; Kibanov, Mark; Scholz, Christoph; Doerfel, Stephan; Hotho, Andreas; Macek, Bjoern-Elmar; Mitzlaff, Folke; Mueller, Juergen; Stumme, Gerd in New Review of Hypermedia and Multimedia (2014). 20(1) 53--77.
The combination of ubiquitous and social computing is an emerging research area which integrates different but complementary methods, techniques and tools. In this paper, we focus on the Ubicon platform, its applications, and a large spectrum of analysis results. Ubicon provides an extensible framework for building and hosting applications targeting both ubiquitous and social environments. We summarize the architecture and exemplify its implementation using four real-world applications built on top of Ubicon. In addition, we discuss several scientific experiments in the context of these applications in order to give a better picture of the potential of the framework, and discuss analysis results using several real-world data sets collected utilizing Ubicon.
-
Folksonomies. Singer, Philipp; Niebler, Thomas; Hotho, Andreas; Strohmaier, Markus in Encyclopedia of Social Network Analysis and Mining (2014). 542--547.
-
Evaluating Assumptions about Social Tagging - A Study of User Behavior in BibSonomy. Doerfel, Stephan; Zoller, Daniel; Singer, Philipp; Niebler, Thomas; Hotho, Andreas; Strohmaier, Markus in CEUR Workshop Proceedings, T. Seidl, Hassani, M., Beecks, C. (eds.) (2014). (Vol. 1226) 18--19.
-
The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Mitzlaff, Folke; Atzmueller, Martin; Hotho, Andreas; Stumme, Gerd in Social Network Analysis and Mining (2014). 4(1)
Applications of the Social Web are ubiquitous and have become an integral part of everyday life: Users make friends, for example, with the help of online social networks, share thoughts via Twitter, or collaboratively write articles in Wikipedia. All such interactions leave digital traces; thus, users participate in the creation of heterogeneous, distributed, collaborative data collections. In linguistics, the
-
Finding Enclosures for Linear Systems Using Interval Matrix Multiplication in CUDA. Dallmann, Alexander; Beck, Philip-Daniel; von Gudenberg, JürgenWolff in Parallel Processing and Applied Mathematics, R. Wyrzykowski, Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) (2014). (Vol. 8385) 582-590.
In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreover the quality of our enclosure is discussed.
-
Subjective vs. Objective Data: Bridging the Gap. Becker, Martin; Hotho, Andreas; Mueller, Juergen; Kibanov, Mark; Atzmueller, Martin; Stumme, Gerd (2014).
Sensor data is objective. But when measuring our environment, measured values are contrasted with our perception, which is always subjective. This makes interpreting sensor measurements difficult for a single person in her personal environment. In this context, the EveryAware projects directly connects the concepts of objective sensor data with subjective impressions and perceptions by providing a collective sensing platform with several client applications allowing to explicitly associate those two data types. The goal is to provide the user with personalized feedback, a characterization of the global as well as her personal environment, and enable her to position her perceptions in this global context. In this poster we summarize the collected data of two EveryAware applications, namely WideNoise for noise measurements and AirProbe for participatory air quality sensing. Basic insights are presented including user activity, learning processes and sensor data to perception correlations. These results provide an outlook on how this data can further be used to understand the connection between sensor data and perceptions.
-
Computing semantic relatedness from human navigational paths on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas in WWW '13 Companion, ACM (ed.) (2013). 171--172.
This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.
-
Informationelle Selbstbestimmung Im Web 2.0 Chancen Und Risiken Sozialer Verschlagwortungssysteme Doerfel, Stephan; Hotho, Andreas; Kartal-Aydemir, Aliye; Roßnagel, Alexander; Stumme, Gerd (2013). Vieweg + Teubner Verlag.
-
Semantics of User Interaction in Social Media. Mitzlaff, Folke; Atzmueller, Martin; Stumme, Gerd; Hotho, Andreas in Complex Networks IV, G. Ghoshal, Poncela-Casasnovas, J., Tolksdorf, R. (eds.) (2013). (Vol. 476)
-
Awareness and Learning in Participatory Noise Sensing. Becker, Martin; Caminiti, Saverio; Fiorella, Donato; Francis, Louise; Gravino, Pietro; Haklay, Mordechai (Muki); Hotho, Andreas; Loreto, Vittorio; Mueller, Juergen; Ricchiuti, Ferdinando; Servedio, Vito D. P.; Sîrbu, Alina; Tria, Francesca in PLoS ONE (2013). 8(12) e81638.
<p>The development of ICT infrastructures has facilitated the emergence of new paradigms for looking at society and the environment over the last few years. Participatory environmental sensing, i.e. directly involving citizens in environmental monitoring, is one example, which is hoped to encourage learning and enhance awareness of environmental issues. In this paper, an analysis of the behaviour of individuals involved in noise sensing is presented. Citizens have been involved in noise measuring activities through the WideNoise smartphone application. This application has been designed to record both objective (noise samples) and subjective (opinions, feelings) data. The application has been open to be used freely by anyone and has been widely employed worldwide. In addition, several test cases have been organised in European countries. Based on the information submitted by users, an analysis of emerging awareness and learning is performed. The data show that changes in the way the environment is perceived after repeated usage of the application do appear. Specifically, users learn how to recognise different noise levels they are exposed to. Additionally, the subjective data collected indicate an increased user involvement in time and a categorisation effect between pleasant and less pleasant environments.</p>
-
Difference-Based Estimates for Generalization-Aware Subgroup Discovery. Lemmerich, Florian; Becker, Martin; Puppe, Frank in Lecture Notes in Computer Science, H. Blockeel, Kersting, K., Nijssen, S., Zelezný, F. (eds.) (2013). (Vol. 8190) 288-303.
-
A Generic Platform for Ubiquitous and Subjective Data. Becker, Martin; Mueller, Juergen; Hotho, Andreas; Stumme, Gerd (2013). New York, NY, USA.
An increasing number of platforms like Xively or ThingSpeak are available to manage ubiquitous sensor data enabling the Internet of Things. Strict data formats allow interoperability and informative visualizations, supporting the development of custom user applications. Yet, these strict data formats as well as the common feed-centric approach limit the flexibility of these platforms. We aim at providing a concept that supports data ranging from text-based formats like JSON to images and video footage. Furthermore, we introduce the concept of extensions, which allows to enrich existing data points with additional information, thus, taking a data point centric approach. This enables us to gain semantic and user specific context by attaching subjective data to objective values. This paper provides an overview of our architecture including concept, implementation details and present applications. We distinguish our approach from several other systems and describe two sensing applications namely AirProbe and WideNoise that were implemented for our platform.
-
Exploiting Structural Consistencies with Stacked Conditional Random Fields. Kluegl, Peter; Toepfer, Martin; Lemmerich, Florian; Hotho, Andreas; Puppe, Frank in Mathematical Methodologies in Pattern Recognition and Machine Learning Springer Proceedings in Mathematics & Statistics (2013). 30 111-125.
Conditional Random Fields (CRF) are popular methods for labeling unstructured or textual data. Like many machine learning approaches, these undirected graphical models assume the instances to be independently distributed. However, in real-world applications data is grouped in a natural way, e.g., by its creation context. The instances in each group often share additional structural consistencies. This paper proposes a domain-independent method for exploiting these consistencies by combining two CRFs in a stacked learning framework. We apply rule learning collectively on the predictions of an initial CRF for one context to acquire descriptions of its specific properties. Then, we utilize these descriptions as dynamic and high quality features in an additional (stacked) CRF. The presented approach is evaluated with a real-world dataset for the segmentation of references and achieves a significant reduction of the labeling error.
-
Proceedings of the Fifth ACM RecSys Workshop on Recommender Systems and the Social Web co-located with the 7th ACM Conference on Recommender Systems (RecSys 2013), Hong Kong, China, October 13, 2013. Mobasher, Bamshad; Jannach, Dietmar; Geyer, Werner; Freyne, Jill; Hotho, Andreas; Anand, Sarabjot Singh; Guy, Ido in CEUR Workshop Proceedings (2013). (Vol. 1066) CEUR-WS.org.
-
How Tagging Pragmatics Influence Tag Sense Discovery in Social Annotation Systems. Niebler, Thomas; Singer, Philipp; Benz, Dominik; Körner, Christian; Strohmaier, Markus; Hotho, Andreas in Advances in Information Retrieval, P. Serdyukov, Braslavski, P., Kuznetsov, S. O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) (2013). (Vol. 7814) 86-97.
The presence of emergent semantics in social annotation systems has been reported in numerous studies. Two important problems in this context are the induction of semantic relations among tags and the discovery of different senses of a given tag. While a number of approaches for discovering tag senses exist, little is known about which
-
Awareness and learning in participatory noise sensing. Becker, Martin; Caminiti, Saverio; Fiorella, Donato; Francis, Louise; Gravino, Pietro; Haklay, Mordechai (Muki); Hotho, Andreas; Loreto, Vittorio; Mueller, Juergen; Ricchiuti, Ferdinando; Servedio, Vito D. P.; Sirbu, Alina; Tria, Francesca in PLOS ONE (2013). 8(12) e81638.
The development of ICT infrastructures has facilitated the emergence of new paradigms for looking at society and the environment over the last few years. Participatory environmental sensing, i.e. directly involving citizens in environmental monitoring, is one example, which is hoped to encourage learning and enhance awareness of environmental issues. In this paper, an analysis of the behaviour of individuals involved in noise sensing is presented. Citizens have been involved in noise measuring activities through the WideNoise smartphone application. This application has been designed to record both objective (noise samples) and subjective (opinions, feelings) data. The application has been open to be used freely by anyone and has been widely employed worldwide. In addition, several test cases have been organised in European countries. Based on the information submitted by users, an analysis of emerging awareness and learning is performed. The data show that changes in the way the environment is perceived after repeated usage of the application do appear. Specifically, users learn how to recognise different noise levels they are exposed to. Additionally, the subjective data collected indicate an increased user involvement in time and a categorisation effect between pleasant and less pleasant environments.
-
Tag Recommendations for SensorFolkSonomies. Mueller, Juergen; Doerfel, Stephan; Becker, Martin; Hotho, Andreas; Stumme, Gerd (2013). (Vol. 1066)
With the rising popularity of smart mobile devices, sensor data-based applications have become more and more popular. Their users record data during their daily routine or specifically for certain events. The application WideNoise Plus allows users to record sound samples and to annotate them with perceptions and tags. The app is being used to document and map the soundscape all over the world. The procedure of recording, including the assignment of tags, has to be as easy-to-use as possible. We therefore discuss the application of tag recommender algorithms in this particular scenario. We show, that this task is fundamentally different from the well-known tag recommendation problem in folksonomies as users do no longer tag fix resources but rather sensory data and impressions. The scenario requires efficient recommender algorithms that are able to run on the mobile device, since Internet connectivity cannot be assumed to be available. Therefore, we evaluate the performance of several tag recommendation algorithms and discuss their applicability in the mobile sensing use-case.
-
Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas in International Journal on Semantic Web and Information Systems (IJSWIS) (2013). 9(4) 41--70.
In this article, the authors present a novel approach for computing semantic relatedness and conduct a large-scale study of it on Wikipedia. Unlike existing semantic analysis methods that utilize Wikipedia’s content or link structure, the authors propose to use human navigational paths on Wikipedia for this task. The authors obtain 1.8 million human navigational paths from a semi-controlled navigation experiment – a Wikipedia-based navigation game, in which users are required to find short paths between two articles in a given Wikipedia article network. The authors’ results are intriguing: They suggest that (i) semantic relatedness computed from human navigational paths may be more precise than semantic relatedness computed from Wikipedia’s plain link structure alone and (ii) that not all navigational paths are equally useful. Intelligent selection based on path characteristics can improve accuracy. The authors’ work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.
-
Ubiquitous Social Media Analysis Third International Workshops, MUSE 2012, Bristol, UK, September 24, 2012, and MSM 2012, Milwaukee, WI, USA, June 25, 2012, Revised Selected Papers Atzmueller, Martin; Chin, Alvin; Helic, Denis; Hotho, Andreas (2013). Imprint: Springer, Berlin, Heidelberg.
-
Deeper Into the Folksonomy Graph: FolkRank Adaptations and Extensions for Improved Tag Recommendations. Landia, Nikolas; Doerfel, Stephan; Jäschke, Robert; Anand, Sarabjot Singh; Hotho, Andreas; Griffiths, Nathan in cs.IR (2013). 1310.1498
The information contained in social tagging systems is often modelled as a graph of connections between users, items and tags. Recommendation algorithms such as FolkRank, have the potential to leverage complex relationships in the data, corresponding to multiple hops in the graph. We present an in-depth analysis and evaluation of graph models for social tagging data and propose novel adaptations and extensions of FolkRank to improve tag recommendations. We highlight implicit assumptions made by the widely used folksonomy model, and propose an alternative and more accurate graph-representation of the data. Our extensions of FolkRank address the new item problem by incorporating content data into the algorithm, and significantly improve prediction results on unpruned datasets. Our adaptations address issues in the iterative weight spreading calculation that potentially hinder FolkRank's ability to leverage the deep graph as an information source. Moreover, we evaluate the benefit of considering each deeper level of the graph, and present important insights regarding the characteristics of social tagging data in general. Our results suggest that the base assumption made by conventional weight propagation methods, that closeness in the graph always implies a positive relationship, does not hold for the social tagging domain.