RSS

Software Performance: Data-type profiling for perf and a few more tools


2024 will be an interesting year for software performance based on what I read a few days ago: “Data-type profiling for perf“:

  • “Tooling for profiling the effects of memory usage and layout has always lagged behind that for profiling processor activity, so Namhyung Kim’s patch set for data-type profiling in perf is a welcome addition. It provides aggregated breakdowns of memory accesses by data type that can inform structure layout and access pattern changes. Existing tools have either, like heaptrack, focused on profiling allocations, or, like perf mem, on accounting memory accesses only at the address level. This new work builds on the latter, using DWARF debugging information to correlate memory operations with their source-level types.”

There’s also the presentation of the author from November 2023:

  • “Memory accesses can suffer from problems like poor spacial and temporal locality, as well as false sharing of cache lines. Existing presentations of profile data, such data from the perspective of code, can make it difficult to reason as to what the problems are and to work out what the fixes should be. A typical fix may be to reorder variables within a data structure. In this work Namhyung Kim will present ongoing work combining perf event and DWARF debug information, in order to correlate samples and present data type of the variables accessed within a program. However, DWARF debug information is not reliable in enabling a good understanding of variables accessed. The presentation will discuss the state of data type profiling and its addition to the Linux perf tool, how toolchain limitations are worked around by the tool, and how toolchains can be improved for data type profiling in the future.”

A few more relevant tools for people doing performance work:

Read the rest of this entry »
 
Leave a comment

Posted by on December 29, 2023 in Books, Linux, Programlama, python

 

Tags: , , , ,

Computing History meets Personal History: Ellis D. Kropotechev and ZEUS, A Marvelous Time-Sharing System


Recently I’ve come across the following message from the Twitter account of SDF Public Access UNIX System, and I realized that I have some personal connection to the whole thing, albeit weakly. “How so?” you might ask, well, keep on reading…

It points to a short movie from 1967, “Ellis D. Kropotechev and Zeus, A Marvelous Time-Sharing Device“: “Set in the Stanford University computer center and cafeteria, the film gives the viewer a feel for the process of computer programming in the 1960s. It illustrates the transition from punched card batch processing computers (using teletypes) to time-sharing computing (using video terminals). Additional technologies employed throughout the film include the IBM 7090, the IBM 26 Printing Card Punch, the Zeus time-sharing program and the Algol/Gogol computer languages. The film’s soundtrack includes “Cool, Calm and Collected” by The Rolling Stones (1967).”

Read the rest of this entry »
 
1 Comment

Posted by on June 3, 2022 in CogSci, Lisp, Programlama, psychology, Tarih

 

Tags: , ,

AI & Math: Hey, Alexa, what is 200 factorial?


Today I decided to ask some difficult questions about very big numbers to Amazon Alexa:

At least, it is better than the current Google Calculator, I mean, Amazon Alexa tries at least, before throwing in the towel 😉

Read the rest of this entry »
 
2 Comments

Posted by on March 7, 2022 in CogSci, Math, Programlama

 

Tags: , , ,

Diversity in Belgium: facts and maps


My wife has recently drawn my attention to a very interesting report about diversity in Belgium: In a piece dated 8th January, 2022, written by Tobias Santens in one of the mainstream media outlets, you can find interactive information visualizations about some diversity figures in Belgium.

The piece starts with the following (automatic translation): “Unknown often remains unloved: discover more about diversity and integration in your municipality. Almost 1 in 3 Flemish people thinks that there are too many people of different origin living in their municipality. 7 out of 10 Flemish people almost never say they have a chat. The differences between municipalities are large. VRT NWS delved into the figures that the Agency for Home Affairs recently bundled in the Local Integration Scan. Search for your municipality below to view the situation in your area.”

Following the introduction and catchy phrases, there are some interactive maps where you can enter where you live in Belgium, and read about people’s perception with regards to people with an immigration background:

When I selected where I live, I was presented with the following information (translation from the original Dutch is done by Google Translate):

Read the rest of this entry »
 
Leave a comment

Posted by on January 9, 2022 in General

 

Tags: , , , ,

How to activate hotplugged / newly added RAM in Linux?


These days I’m busy helping one of our clients build a data platform for their renewable energy project in their own data center using Nutanix. I requested from their tech support a RAM and CPU cores upgrade for one of the virtual machines that was already running Debian GNU/Linux.

Should I buy this htop t-shirt, or go on a vacation? 😉

When they informed me that they increased the number of CPU cores and the amount RAM from the Nutanix side, I proceeded to reboot the server: To my surprise, even though I was able to see the correct number of CPU cores in htop, it seemed like the amount of RAM stayed the same! Where was the missing RAM? Nutanix management system showed that it allocated the requested amount of RAM to the server, but unlike the newly added CPU cores, we simply couldn’t see the expected amount of RAM from within the virtual machine running Debian GNU/Linux server.

After a brief investigation, we discovered that this has to do with Memory Hotplug mechanism of Linux kernel: using lsmem showed the ranges of available memory, the ones corresponding to the missing amount marked as offline.

I found out that it was possible to bring the offline memory ranges online (and vice versa) using chmem utility, e.g.:

Read the rest of this entry »
 
Leave a comment

Posted by on June 17, 2021 in Debian, Linux, sysadmin

 

Tags: , , , , , , , ,

A new data structure in town: Maple Tree


Thanks to a recent post on lwn.net, I learned about a new data structure: Maple Tree. Apparently, it’s been in development for the last 1.5 years: “The Maple Tree is a new data structure for Linux that provides an efficient way to store index ranges which map to a single pointer. It is RCU-safe and optimised for modern CPUs. For this application, it outperforms both the existing rbtree and radix tree data structures. The API is inspired by the XArray, and is significantly easier to use than the rbtree. This talk will cover the details of the implementation and show examples of users.”

This is what I could find about this up and coming “Maple Tree” data structure for enhancing Linux performance:

The Linux Maple Tree – Matthew Wilcox, Oracle
Read the rest of this entry »
 
Leave a comment

Posted by on February 15, 2021 in Linux, Programlama

 

Tags: , ,

Unix and Women


I’ve recently come across the names of two women that were active during the birth and early days of Unix, back in 1970s and 1980s. For future reference, I wanted to note down information about these pioneering women.

“For many people, writing is painful and editing one’s own prose is difficult, tedious, and error-prone. It is often hard to see which parts of a document are difficult to read or how to transform a wordy sentence into a more concise one. It is even harder to discover that one overuses a particular linguistic construct. The system of programs described here helps writers to evaluate documents and to produce better written and more readable prose. The system consists of programs to measure surface features of text that are important to good writing style as well as programs to do some of the tedious jobs of a copy editor. Some of the surface features measured are readability, sentence and word length, sentence type, word usage, and sentence openers. The copy editing programs find spelling errors, wordy phrases, bad diction, some punctuation errors, double words, and split infinitives.”

Computer aids for writers“, Lorinda Cherry, ACM SIGPLAN Notices, April 1981

Lorinda Cherry and Nina McDonald worked on Writer’s Workbench among other things in 1970s at Bell Labs. I wish the utilities that made up Writer’s Workbench would still be available and actively developed as free and open source software, maybe via GitHub (all I could find was this discussion on Hacker News).

According to M. Douglas McIlroy, Lorinda Cherry also contributed to another operating system: Plan 9.

The curious readers of history of computing can learn more about these women in the following online resources:

Read the rest of this entry »
 
Leave a comment

Posted by on February 2, 2021 in Programlama, Tarih

 

Tags: ,

Truth, correctness and utility: an example from Information Theory


I’ve come across the following when doing research on “data processing inequality“:

Fom page 19 of “Elements of Information Theory“, Second Edition, 2006, Thomas M. Cover and Joy A. Thomas

As it’s also stated in Scholarpedia’s “Mutual information” article, “Kullback-Leibler divergence is not a true distance: it is not symmetric, and it does not obey the triangle inequality (Cover and Thomas, 1991). It is not hard to show that DKL(P(z)||Q(z)) is non-negative, and zero if and only if P(z)=Q(z) .”

I found this a striking example of an expression not being true, and mathematically wrong, but the concept still being “useful“, as stated by Cover and Thomas, as long as you are experienced, and well aware of what you’re doing.

Further Reading:

 
Leave a comment

Posted by on February 2, 2021 in Books, Math

 

Tags: , , , ,

Diacritics restoration: can we do better by using neural networks and deep learning? Perspectives from a 10-year-old open source project


UPDATE (2023-06-14): Now that we’re living in the world of ChatGPT and Large Language Models (LLMs), a software developer, Murat Çorlu, suggested that ChatGPT’s performance for diacritics restoration (deasciification) for Turkish is very successful: https://twitter.com/muratcorlu/status/1668335101602848768 He shared his example at https://chat.openai.com/share/3bb666fd-9f35-40df-8efb-9dd0c59bb264. In order to see if ChatGPT is really the best (see the Accuracy benchmark given in “TABLE IV” below), a nice experiment would be to take a validated Turkish corpus, “asciify” it, feed the output to ChatGPT (e.g. via its API), retrieve the “deasciified” output, comparing it to the original corpus and checking what percentage of the text matches the original one. If the result turns out to be at least 1-2 points bigger than 97.06%, we’ll have a clear winner! 😉 Of course, enough care should be taken so that the initial Turkish corpus is not only validated (all diacritics are correct), but also representative of Turkish usage in a lot of domains, including multi-lingual texts, texts with heavy foreign terminology, abbreviations, ambiguities, etc.

People who need to write correctly in languages that have letters with various diacritics such as ‘ğ‘, ‘ş‘, ‘ö‘, ‘ı‘, etc., can be troubled with US or UK standard QWERTY keyboards because of the lack of such letters on those keyboard layouts. If you also need to switch between languages such as English, and Turkish, you know what I mean.

Possible forms of diacritic restoration in Turkish for “aci”. Source: “Diacritic Restoration Using Recurrent Neural Network” by Ayşenur Genç Uzun

The process of taking a piece of writing without correct spelling (that uses standard ASCII characters, without proper diacritics) , and replacing the relevant letters with the correct ones is known as “diacritics restoration“, or “diacritics reconstruction” (or “deASCIIfication” colloquially). About 10 years ago, I wrote a Python program to help people with this: Turkish Deasciifier; a port of the Emacs Lisp code developed by Prof. Deniz Yüret. There’s also a web interface at http://turkceyap.appspot.com.

Read the rest of this entry »
 
Leave a comment

Posted by on October 22, 2020 in Linguistics, Programlama, python, Science

 

Tags: ,

What is Engineering? Perspectives from “The Sciences of the Artificial”


If you are an engineer, or an engineering manager responsible for designing software-intensive complex systems, you will find a lot of food for thought in the following quotes from “The Sciences of the Artificial” by Nobel laureate and Turing Award recipient Herbert A. Simon. You might realize that the term ‘software‘ never appears in the following quotations, and the word ‘program‘ is mentioned only twice. Yet, the issues, concerns, methods, and the line of reasoning proposed by Simon can be used to attack the core of challenges facing software engineers working on different systems, and diverse domains. I believe these, as well as most of the rest of the book, deserve a critical and deep reading by generations of engineers.

“There is nothing special that needs to be said here about resource conservation—cost minimization, for example, as a design criterion. Cost minimization has always been an implicit consideration in the design of engineering structures, but until a few years ago it generally was only implicit, rather than explicit. More and more cost calculations have been brought explicitly into the design procedure, and a strong case can be made today for training design engineers in that body of technique and theory that economists know as “cost-benefit analysis.””

Read the rest of this entry »
 
Leave a comment

Posted by on October 6, 2020 in business, Management, Programlama, Science

 

Tags: , ,