Excited to announce that I will be at today speed demo-ing my latest project: an ActivityPub data observatory!

This observatory does not collect any user data or metadata. Instead I am looking at the *shape* (aka schema) of data being sent around the fediverse. This will let software devs ask questions like "How is a Mastodon 4.2.0 image post formatted differently from a Misskey 2024.7.0 image post?"

And we'll get real answers based on data rather than on poor documentation.

7
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin
Darius Kazemi

I won't be actually LAUNCHING this tool until I've found out how you all would feel about it being opt-out vs opt-in. I will provide a longer blog post for you all to read with details, but in short:

It would be really helpful for general interop on the fedi if this were opt-out. But if people are generally freaked out by having technical details about software data formats being opt-out... I'll make it opt-in.

Quick explanation of the data scrubbing in the attached images

An image of some data labeled "WHAT SCRAPERS COLLECT" that contains things like:

  atomUri: 'https://friend.camp/users/darius/statuses/113052195027718205',
  inReplyToAtomUri: null,
  conversation: 'tag:friend.camp,2024-08-30:objectId=26481883:objectType=Conversation',
  localOnly: false,
  content: '<p>On Sep 12 the Applied Social Media Lab (where I work) is hosting an event</p>', An image of some data labeled "WHAT THIS DATA OBSERVATORY COLLECTS" that contains things like:

  atomUri: '<uri>',
  inReplyToAtomUri: null,
  conversation: '<string>',
  localOnly: <boolean>,
  content: '<string>',
6
2mo
Seth (PhillyCodeHound)🎙️

@darius Oh this is too cool!

0
2mo
Evan Prodromou

@darius can you compare to browser.pub?

1
2mo
Frango

@darius love this, it might even more useful than a test suite!

1
2mo
Jenniferplusplus

@darius how do you collect it? Do you just follow a bunch of actors on different software?

1
2mo
Marco Rogers

@darius this looks cool.

0
2mo
Schmembot

@darius It would be helpful to non-devs to know how rigorously safety was considered and to what extent it's mitigated before implementation. A few thoughts

• Why is it important to more easily know how the data is shaped?

• Who benefits, and how could/might the resulting info be misused, even if 'anonymized'?

• How much trust is the user expected to have (could PII scraped data be leaked or hacked before it's tossed, for example)?

• What are the unintended consequences of enabling this?

0
2mo
Replies