A positional keyword-based approach to inferring fine-grained message formats

Jiang, J, Versteeg, S, Han, J, Hossain, M and Schneider, J 2020, 'A positional keyword-based approach to inferring fine-grained message formats', Future Generation Computer Systems, vol. 102, pp. 369-381.

Document type: Journal Article
Collection: Journal Articles

Title A positional keyword-based approach to inferring fine-grained message formats
Author(s) Jiang, J
Versteeg, S
Han, J
Hossain, M
Schneider, J
Year 2020
Journal name Future Generation Computer Systems
Volume number 102
Start page 369
End page 381
Total pages 13
Publisher Elsevier BV * North-Holland
Abstract Message format extraction, the process of revealing the message syntax without access to the protocol specification, is important for a variety of applications such as service virtualization and network security. In this paper, we propose P-token, which mines fine-grained message formats from network traces. The novelty of our approach is twofold: a positional keyword identification technique and a two-level hierarchical clustering strategy. Positional keywords are based on the insight that keywords or reserved words usually occur at relatively fixed positions in the messages. By associating positions as meta-information with keywords, we can more accurately distinguish keywords from message payload data. After identification, the positional keywords are used as features to cluster the messages using density peaks clustering. We then perform another level of clustering to refine the clusters with low homogeneity. Finally, the message format of each cluster is extracted based on the observed ordering of keywords. P-token improves on the current state-of-the-art techniques by successfully addressing two challenges that commonly afflict existing keyword based format extraction methods: message keyword mis-identification and message format over-generalization. We have conducted experiments on services and applications using various protocols, including SOAP, LDAP, IMS and a RESTful service. Our experimental results show that P-token outperforms existing methods in extracting message formats.
Subject Information Retrieval and Web Search
Social and Community Informatics
Human Information Behaviour
Keyword(s) Positional keyword
Protocol message formats
Two-level clustering
DOI - identifier 10.1016/j.future.2019.08.011
Copyright notice © 2019 Elsevier B.V. All rights reserved.
ISSN 0167-739X
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 4 Abstract Views  -  Detailed Statistics
Created: Thu, 09 Apr 2020, 13:20:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us